Skip to content

Comments

feat: Arkham integration - Address enrichment service#1573

Merged
pikonha merged 31 commits intodevfrom
feat/address-enrichment
Feb 20, 2026
Merged

feat: Arkham integration - Address enrichment service#1573
pikonha merged 31 commits intodevfrom
feat/address-enrichment

Conversation

@alextnetto
Copy link
Member

Summary

New microservice (apps/address-enrichment) that enriches Ethereum addresses with identity data from Arkham Intel API.

Features

  • REST API to lookup address labels, entity info, and contract/EOA type
  • Permanent PostgreSQL storage (fetch once, store forever - protects against API access loss)
  • Batch endpoint for resolving up to 100 addresses at once
  • CLI sync command to pre-populate top delegates and token holders from Anticapture API
  • Gateway integration - exposed via GraphQL Mesh

Endpoints

  • GET /address/:address - Single address lookup
  • POST /addresses - Batch lookup (max 100)

Tech Stack

Hono, Drizzle ORM, viem, Arkham Intel API

New Environment Variables

- Hono REST API with GET /address/:address endpoint
- Arkham Intel API integration for entity/label data
- EOA/contract detection (Arkham fallback to RPC)
- PostgreSQL permanent storage via Drizzle ORM
- CLI sync command to batch-fetch top delegates and holders
- OpenAPI/Swagger documentation at /docs
- Fix GraphQL BigInt values (use strings instead of numbers)
- Show voting power and balance with K/M/B formatting
- Display Arkham data (entity, label, type) for all addresses
- Show contract indicator for contract addresses
- POST /addresses accepts array of up to 100 addresses
- Processes in parallel with concurrency limit of 10
- Returns results array and errors array for partial failures
- Deduplicates addresses automatically
- Add address-enrichment as OpenAPI source in GraphQL Mesh
- Exposes getAddressEnrichment and batchAddressEnrichment queries
- Service is optional (only loaded if ADDRESS_ENRICHMENT_API_URL is set)
@vercel
Copy link

vercel bot commented Jan 23, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
anticapture Ready Ready Preview, Comment Feb 20, 2026 7:10pm
anticapture-storybook Ready Ready Preview, Comment Feb 20, 2026 7:10pm

Request Review

@pikonha pikonha marked this pull request as draft January 26, 2026 20:13
@pikonha
Copy link
Member

pikonha commented Feb 20, 2026

@claude review

@claude
Copy link

claude bot commented Feb 20, 2026

Claude finished @pikonha's task in 1m 54s —— View job


Code Review — feat: Arkham integration - Address enrichment service

  • Read changed files
  • Analyze code quality, architecture, and security
  • Post review feedback

The existing review from @pikonha already covers many important issues. I'll focus on items not yet raised and add detail to a few that warrant more attention.


Critical Issues

1. No Tests

Zero test coverage on business-critical code. The most important missing tests:

  • EnrichmentService.getAddressEnrichment — the ENS staleness/TTL logic, the race condition fallback, the "stale but Arkham data exists" path, and the case where mapToResult is called with all-null Arkham fields
  • isContract — the default-false-on-RPC-error behavior is permanently cached
  • AnticaptureClient streams — verify the exit condition when items.length < pageSize

2. runMigrations Uses execSync + --force at Startup

src/db/helpers.ts:30

execSync("drizzle-kit push --force", {
  env: { ...process.env, DATABASE_URL: connectionString },
  stdio: "inherit",
});

This runs synchronously (blocking the event loop) every time the service starts. --force bypasses safety confirmations and can silently drop columns or tables in destructive migrations. This pattern is inappropriate for a production service — standard practice is to run migrations as a separate deploy step or use drizzle-orm/migrator with committed migration files (which is what the drizzle/ directory appears to be for). Fix this →

3. mapToResultarkham Is Never null Despite Schema Allowing It

src/services/enrichment.ts:177

arkham: {
  entity: record.arkhamEntity,
  entityType: record.arkhamEntityType,
  label: record.arkhamLabel,
  twitter: record.arkhamTwitter,
},

The schema defines arkham as nullable, but mapToResult always returns an object (even when all four fields are null). This means API consumers cannot distinguish "address not found in Arkham" from "Arkham returned data but all fields were empty." The response differs from the OpenAPI schema contract. Consider returning null when all arkham fields are null. Fix this →


Bugs

4. normalizedAddress as Address Skips Validation in Sync Script

src/scripts/sync-top-addresses.ts:124

isContractAddress = await isContract(
  rpcClient,
  normalizedAddress as Address,  // ← unsafe cast
);

The service correctly uses getAddress(normalizedAddress) which validates and checksums, but the sync script bypasses this with a type assertion. A malformed address from the API would reach the RPC call and produce a confusing error (possibly caught and silently defaulted to false). Fix this →

5. Sync Script Doesn't Populate ENS Data

enrichAddress() in the sync script only fetches Arkham data. Records created via sync will have ensUpdatedAt = null, which causes isEnsFresh to return false, triggering an ENS fetch on the very first API request for every synced address. This defeats part of the point of pre-populating. Given that the sync processes potentially hundreds of thousands of addresses, the first wave of API requests will trigger a thundering herd of ENS calls.

6. Batch Endpoint Also Leaks Error Messages

src/controllers/address.ts:163

error: result.reason instanceof Error
  ? result.reason.message
  : "Unknown error",

The 500 handler in index.ts was noted by @pikonha, but the batch endpoint also forwards raw error messages — including potential DB connection strings or internal stack info — to API consumers.


Code Quality

7. drizzle-kit in dependencies Instead of devDependencies

drizzle-kit is a development/deploy tool, but since runMigrations shells out to it at runtime, it ends up needing to be a runtime dependency. This is a symptom of issue #2 above — fixing the migration approach would let drizzle-kit move back to devDependencies and reduce the production image size.

8. Sync Script --limit Is Parsed but Never Applied

src/scripts/sync-top-addresses.ts:38-74

options.limit is stored but never passed to streamTopDelegates or streamTopTokenHolders. Both generators iterate through all pages. This was flagged by @pikonha and is confirmed — the AnticaptureClient generators don't accept a limit parameter at all.

9. Sequential Rate Limiting Makes Sync Impractically Slow

await new Promise((resolve) => setTimeout(resolve, 100));

At 100ms per address sequentially, enriching 100k addresses takes ~2.8 hours. Even with a concurrency window (like the controller's CONCURRENCY = 10), this could be significantly faster. A simple pLimit-style batch would help here.


Summary

Area Assessment
Architecture Clean separation of concerns, good use of Zod for external API parsing
Tests None — blocker
runMigrations execSync + --force at startup is dangerous in production
Schema contract mapToResult doesn't match the nullable arkham schema
Sync script Missing ENS fetch, unsafe address cast, --limit broken, duplicate logic
Security Error leakage in both global handler and batch endpoint
Performance Sync is fully sequential with 100ms delay

Blocker: Tests and the runMigrations approach. The mapToResult null inconsistency is also a contract bug worth fixing before merge.

@pikonha pikonha merged commit 4b51bd9 into dev Feb 20, 2026
10 of 12 checks passed
@pikonha pikonha deleted the feat/address-enrichment branch February 20, 2026 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants